Spark Workflow

Typical Spark workflow that consists of ingestion, processing, storage and analytics –

  • Ingests data from source
    • HDFS, NoSQL, S3, real time sources, etc.
  • Transforms Data
    • Filter, Clean, Join, Enhance
  • Persists processed data
    • Memory, HDFS, NoSQL
  • Interactive Analytics
    • Shells, Spark SQL, third-party tools
  • Machine Learning
  • Action

All these tasks in the workflow are explained in detail in later sections

results matching ""

    No results matching ""